Experience with Fine-Grain Communication in EM-X Multiprocessor for Parallel Sparse Matrix Computation
نویسندگان
چکیده
Sparse matrix problems require a communication paradigm different from those used in conventional distributed-memory multiprocessors. We present in this paper how fine-grain communication can help obtain high performance in the experimental distributed-memory multiprocessor, EM-X, developed at ETL, which can handle fine-grain communication very efficiently. The sparse matrix kernel, Conjugate Gradient, is selected for the experiments. Among the steps in CG is the sparse matrix vector multiplications we focus on in the study. Some communication methods are developed for performance comparison, including coarse-grain and fine-grain implementations. Fine-grain communication allows exact data access in an unstructured problem to reduce the amount of communication. While CG presents bottlenecks in terms of a large number of fine-grain remote reads, the multithraded principles of execution is so designed to tolerate such latency. Experimental results indicate that the performance of fine-grain read implementation is comparable to that of coarse-grain implementation on 64 processors. The results demonstrate that fine-grain communication can be a viable and efficient approach to unstructured sparse matrix problems on large-scale distributed-memory multiprocessors.
منابع مشابه
Constrained Fine-Grain Parallel Sparse Matrix Distribution
We consider how to distribute sparse matrices among processors to reduce communication cost in parallel sparse matrix computations, in particular, sparse matrix-vector multiplication. We allow 2d distributions, where the distribution (partitioning) is not constrained to just rows or columns. The fine-grain model is a 2d distribution introduced in [2] where nonzeros can be assigned to processors...
متن کاملA Nested Dissection Approach to Sparse Matrix Partitioning for Parallel Computations
We consider how to distribute sparse matrices among processes to reduce communication costs in parallel sparse matrix computations, specifically, sparse matrix-vector multiplication. Our main contributions are: (i) an exact graph model for communication with general (two-dimensional) matrix distribution, and (ii) a recursive partitioning algorithm based on nested dissection (substructuring). We...
متن کاملMinimizing Communication Cost in Fine-Grain Partitioning of Sparse Matrices
We show a two-phase approach for minimizing various communication-cost metrics in fine-grain partitioning of sparse matrices for parallel processing. In the first phase, we obtain a partitioning with the existing tools on the matrix to determine computational loads of the processor. In the second phase, we try to minimize the communicationcost metrics. For this purpose, we develop communication...
متن کاملCombinatorial Algorithms for Parallel Sparse Matrix Distributions
Combinatorial algorithms have long played a crucial enabling role in parallel computing. An important problem in sparse matrix computations is how to distribute the sparse matrix and vectors among processors. We review graph, bipartite graph, and hypergraph models for both 1d (row or column) distributions and 2d distributions. A valuable tool is hypergraph partitioning. We present results using...
متن کاملCombinatorial Problems in High-Performance Computing: Partitioning
Partitioning is of fundamental importance in high-performance computing: partitioning the data and the associated computational work in an optimal manner leads to good load balance and minimal communication in parallel computations on modern architectures. Often, the computation is irregular and the data set is described by a sparse matrix, a graph, or a hypergraph. This results in a combinator...
متن کامل